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Abstract — We analyze the asymptotic performance of sparse 
signal recovery from noisy measurements. In particular, we 
generalize some of the existing results for the Gaussian case 
to subgaussian and other ensembles. An achievable result is 
presented for the linear sparsity regime. A converse on the 
number of required measurements in the sub-linear regime is also 
presented, which cover many of the widely used measurement 
ensembles. Our converse idea makes use of a correspondence 
between compressed sensing ideas and compound channels in 
information theory. 



I. Introduction 

Sparse support recovery has been given much attention of 
late, due to the fact that many signals dealt with are sparse in 
some basis. We will consider the model. 



y = Ax 



where y € 



A e 



z e 



(1) 

distributed with 



Af{0,(j'^T). The support of x is the index set T, supp(x) — 
\T\ = k. The signal power, ||x||^^ = P. Each column of A is 
normalized to have unit £2 norm . 

Our main motivation in this paper is to study a wider class 
of measurement matrices. Previous studies have specifically 
focussed on the Gaussian measurement matrix [1], [13]. Two 
distinct sparsity regimes are often considered in literature: 

• Sublinear: - ^ as both fc, n — > 00, and 

• Linear: k — pn for p £ (0, 1). 

The following three performance estimates were studied in [1], 
[13]. 

• Error metric 1: 

di(x, x) = 1 {{x, 7^ Vi e X} n {xj = Vj ^ I}) 

• Error metric 2: 



• Error metric 3: 



(i3(x,x) = 1 



J2 |xfep>(l-<5)P 



where l(-) is the binary valued indicator function which is 
unity when the argument is true, and a, S are in (0, 1). In 
Section ini we focus on subgaussian measurement matrices. 

Definition 1.1: A random variable x is subgaussian if there 
is a constant B > Q such that 

Pr(|a;| >t)< 2exp{-t^/B^) 



for all t > 0. The smallest B is called the subgaussian moment 
of X. 

An example of a subgaussian measurement matrix is the 
matrix with i.i.d. entries of ±l/y/m distributed according to 
Bernoulli(i). 

We show that centered subgaussian measurement matrices 
achieve the same asymptotic results as Gaussian measurement 
matrices in the linear sparsity regime, i.e. m = 0{k) mea- 
surements suffice for signal recovery. For the linear case, we 
are taking the pessimistic point of view that good measurement 
(sensing) schemes should have an exponentially decaying error 
probability in the number of measurements, which will also 
have a bearing on the practical constructions. On the other 
hand, if we take an optimistic (see [5]) viewpoint, that a sub- 
exponential decay in error is acceptable, our analysis remains 
valid for the sub-linear regime also. 

In Section |III1 we present some converse results, which 
lower bounds the required number of measurements for 
asymptotically exact support recovery. Our converse results 
give the required scaling of m with respect to n and k in 
both the regimes. Specifically, we invoke a correspondence 
between compressed sensing schemes and compound channels 
in information theory. Here we consider general measurement 
matrices and the underlying assumptions are mild. 

II. Achievability 

Our setup for achievability is similar to [1]. In particular, 
we extend Theorems 2.1, 2.5 and 2.9 from [1] which provide 
results for the number of measurements needed using Gaussian 
measurement matrices for the error metrics considered in this 
paper For Gaussian measurement matrices, the number of 
measurements required for all three error metrics in the linear 
sparsity regime is to = 0{k), where the hidden constant 
value differs for each error metric. For completeness, we state 
Theorem 2.1 from [1] here. 

Theorem 2.1 (Achievability fiDr error metric 1): Let a se- 
quence of sparse vectors, {x(„) £ (x(„) denotes a 
dependence on n) with supp(x(„)) — k — [pnj. Then 
asymptotic reliable recovery is possible for {x(„-) } with respect 



to error metric 1 if 



kp, (x(„)) 



log k 



cx) as fc ^ 00 and 



TO > Clfc 



where /i(x) = min,;gi \xi\ and ci is a constant depending on 
p, p{x) and a. 

Our result here shows that these results apply to subgaussian 
measurement matrices. 
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On the other hand, in the sublinear sparsity regime, mea- 
surements required are now in the order of m — 0{k log{n~ 
k)) for all three error metrics for Gaussian measurement matri- 
ces. As mentioned earlier, if we take an optimistic viewpoint, 
then subgaussian measurement matrices also achieves the same 
performance as the Gaussian counterpart. The Lasso scheme 
was shown to perform optimally in the sublinear regime [12] 
but the results show that there is a significant gap of the 
performance of Lasso in the linear regime. 

Let I?(y) be a decoder, which outputs a set of indices, 
depending on the problem objective. Our achievability results 
show the existence of asymptotically good measurement ma- 
trices. Similar to the random coding arguments in information 
theory, the average error probability attained by using random 
measurement matrices chosen from an ensemble can be made 
arbitrarily small asymptotically. However, good matrices are 
not explicitly identified. 

The probability of decoding error for V, averaged over all 
measurement matrices A, is defined as 

Perr(2?|x) = Ea (Perr ( A|x) ) = EA(Pr(I?(y) ^ I)). 

We focus on decoders using joint typicality. We define the pro- 
jection matrix of B as IIb = B(B^B)^^B^. The orthogonal 
projection is defined as IIb = I - B(B'^B)"1B'^. 

Definition 2.2 (Joint Typicality): [1] The noisy observation 
vector y and a set of indices J d {1,2,..., n}, with \ J\ = k, 
are i5-jointly typical if rank(Aj) = k and 



1 



ini^yll 



-k 



Denote the events. 



< 5 



and 



Q,j = {y and J are (5-typical} 



rig {rank(Ai) < k}. 



The decoder has three sources of error: 

• the decoder searches incorrect subspaces, event f^o, 

• the true support set I is not (5-jointly typical, event 
and 

• the decoder recovers another support set J such that J ^ 
I, event il j. 

Hence, the upper bound to the decoder error is given by union 
bound of the three sources of error, 

Perr{V\^) < Pri^o) + Pr{fl^j) + ^ Priflj). (2) 

It suffices to find bounds on each error probability that 
vanishes asymptotically as n — > oo. We show this below. 



are i.i.d. copies of X. Then there are positive constants ci, C2 
(depending polynomially on B) such that for any t > 

Pr(.Sfc(X) < t{V^-Vk~^)) < (cii)™-^+i +6-^="*. 

where Sfc(X) denotes the smallest singular value of X. 

In particular, the above lemma suggests that for subgaussian 
matrices, there is an exponentially small positive probabiUty 
that Sn — 0. We use this in the following result. 

Theorem 2.4: Assume m > k. Given an index set I C 
{l,2,...,n} with \I\ = k, 

Pr(rank(Ai) < fc) < e^'^"'" 

for some constant cq > 0. 

Proof: To ensure recovery of x, it is essential that 
rank(Ai) = fc or equivalently, the smallest singular value, 
Sfc(Ax) 7^ 0. Using Lemma l273l and choosing small t, we 
have 



Pr(sfc(Ax) = 0) = limPr(sfe(Ai) < t{^-^/k^)) 



< e 



Remark 2.5: Reference [1] uses the fact that if A has 
i.i.d. entries with A/'(0, 1), then Pr(rank(Ai) < fc) = 0, 
i.e., Ai can never be singular For subgaussian matrices, it 
is possible for such an error to occur For example, with the 
random sign matrices distributed according to Bernoulli(|), it 
is easy to see that 



Pr(rank(Ai) < fc) > 



Hence, Theorem 12.41 says that in the linear regime, the error 
decay for the event flo is exponential with the number of 
measurements m > k. However, a sub-exponential decay to 
zero can be achieved even for the sublinear case. The rest of 
our arguments are valid for both cases. 

We first modify Lemma 3.3 from [1] by introducing conditions 
under which the result is still valid. We then show that the 
subgaussian measurement matrices satisfy these conditions. 
Lemma 2.6: 1) Let T — supp(x) and assume that 
rank(Ai) = fc. Then for S > 0, 



Pr 



-l|ni,y||f,-^a^ 

m m 



< 2 exp - 



> <5 



4ct4 m - fc+ 4m 



A. Proof of Achievability 

We first find bounds on the probability that fio occurs by 
using the following result [11, Theorem 1.1]. 

Lemma 2.3: Let X be a subgaussian random variable with 
zero mean, variance one and subgaussian moment B. Let 
X e R™^*^, m > fc be the random matrix whose entries 



This result holds for any measurement matrix A. 
2) Let J' be an index set such that \J^\ ^ k and \T O J'\ 
p < k, where T = supp(x) and assume that rank(Aj-) 
fc. Let 



V = 



(m — fc) 
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where — a^f + c^- Then y and J are (5-joint 

typical with probabiHty 



Pr 



1 , , . I , , i-> TYi k i-> 

-mi^vl 

m ^ ^ m 



<S 



<2exp (l-^)-^m 
if the moment condition 



logE[e*^] < -7ii - ^ log(l - 72t) 



(3) 



is satisfied with constants 71, 72 > for t < 1/72- 
Proof: The proof for the first item is the same as that of 
the proof of the first part given in [1, Lemma 3.3]. We have 



and 



\iex\J 

It can be seen that, by the property of symmetric projection 
matrices, n^^II^ — 11^ . Furthermore, z is independent of 
the entries of 11^ . Hence by [8, Chapter 18], 



ini z| 



By using concentration inequalities of chi-squared random 
variables around their degrees of freedom (to — fc here) as 
in [1, Lemma 3.3], the same result is obtained. 
For the second part of the lemma, we have 

m — k 



Pr 



-lini v|ll- 



< 5 



Pr(-||ni,y|ll-^-^<^ 

TO TO, 



PM^Iini^y 



TO 



TO 



TO 

Pr ( < -(to -k)[\- ^) + -^TO 



a. 



V 



+ Pr > -(to - fc) ^1 - - -^m 

Using Chernoff's bound and the moment condition, it can be 
shown that for any A > (see Appendix), 



Vxiy > 72A + V271A) < Pr(y > V271A) < e 



(4) 



and 



Pr(F< < e-\ (5) 
We bound the first probability by choosing in equation (|4]l. 



Ai = [(m -k)[\~ ^) - -^m 
271 V V ^« 

and for the second. 



in equation dU, we have 



Pr 



-||ni,y|| 

TO 



< (5 < 2exp(-Ai). 



since Ai < A2. ■ 
Theorem 2.7: Subgaussian measurement matrices satisfy 
Lemma 12.61 with 71 — m — k and 72 ~ 2. 

Proof: We only need to show how subgaussian mea- 
surement matrices satisfy Lemma I2.6r 2). We first note that 
subgaussian r.v.s have a closure property under addition. 
Hence, the vector 

y = ^ XiSii + z 
is still subgaussian since for some constant a [9], 
E[e*y] < exp I ^( J2 + I < cxp 

\ iGl\J J ^ 

where 1 is the column vector of Is, a' > is a constant and 

2 



V 



E 



2 , 2 
X, + a . 



Note that the vector is independent of the entries of . 
Since 11^ is a symmetric and idempotent, we rewrite 



|ni,y|| 



To bound the moment, we require an estimate using [9, 
Lemma L2], for < t < l/(2a'), 

E [exp(tF)] < e-'^™-*^) • (1 - 2t)-(™-'=)/2_ 

Note that the upper bound is the moment generating function 
of distribution Xm-fc- 

The function logE[exp(tV^)] is monotonically decreasing in 
i < and at t = 0, we have logE[exp(<F)] < 0. On the other 
hand, the function (to — k)t^ is monotonically increasing for 
t < 0. As such, we have logE[exp(tV^)] < (m — k)t^ for 
t < 0. Hence, it can be easily seen that 71 = m — k and 
72 = 2. ■ 

2 

With 71 = TO — A;, 72 = 2, (7 = 1 — ^ and 6' = Sm/ {ni — k). 



Pr{flj) < 2 exp 
< 2 exp 

= 2 exp 




A2 = ^ ( (to - fc) ( 1 



Assuming rank(A j) = k, the number of subsets J' that over- 
laps J in p indices is upper-bounded by (p)(fcZp), implying 
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that by ^ and Theorem 12.61 

Perr{T^\^) < exp( — Cqto) + 2 CXp 



k 

p=i 



n ~ k 
k — p 



cxp(- 



4(7'^ m — k 

( E -I 

kei\J 



2<5„ 



(5' 



\kei\J 



We sketch an outline of the rest of the proof here. Only 
Pr(0j7) changes depending on the error metric. Let \lnj'\ = 
p for some particular set J. For error metric 1, we note that 
J2keT\J — ^ p)m^(^)- For error metric 2, since we 
only need Pr(rij) for p < (1 - a)k for a e (0, 1), 
then we have J2kei\J^'k — ctkis^lx) for error to occur 
Finally, for error metric 3, we have J2keT\J ^1 — 7^ ^^i" 
error to occur. The rest of the arguments on bounding the 
error probability follows that of the analysis on Gaussian 
measurement ensembles in [1], both in the linear and sublinear 
sparsity regimes. 

in. Converse on the Number of Measurements 

Our starting point is again the signal recovery model in ([T|). 
For simplicity, assume that x has k non-zero entries. Further 
more, the entries of A are taken from some alphabet A, and 
normalized, i.e., for each column ai, 



1 

— \\a 
m 



= 1 



eA 



(6) 



Note that the measurement matrix A is specified in advance 
without the knowledge of the instantaneous realization of x. 
So A depends only on the global properties of x and the 
noise statistics. For simplicity (also for practical reasons), we 
make the mild assumption that there is no prior knowledge 
about the input values favoring any particular locations. This 
implies that the support of x is uniformly chosen from the (^) 
possible choices. 

Our discussion in this section is for the error metric 1, but 
can be tailored for other purposes too. Recall that for the 
first metric, we are interested in recovering the support of 
X based on m measurements from ([T]i- The error probability 
in recovering the support lower-bounds that of exact signal 
recovery. This can be easily seen by imagining a genie which 
tells the receiver about the non-zero components in the order 
of their appearance. 

We need some notation to proceed. Let us define the 
following: 

a - the vector of non-zero values of x, in descending 

order of magnitude, the i*'* entry being a^. 
(3 - non-zero values of x in the order of appearance. 
lo - set of indices of x with zero magnitude. 
p{ai) - index in x corresponding to the z*'* entry of a. 
R{k,a,(j^) - capacity region of a fc-user single antenna 
Gaussian MAC with channel gains d, and 
input constraints as in (|6]l. 
Let X be the recovered vector using some decoding method. 
In this section, we assume that m is large enough, with respect 
to k and in relation to n, to ensure that the probability of 
decoding error tends to zero as m, n and k tend to infinity. 



The error event can be written in terms of a random variable 
<i>, which is defined as. 



< n 



\i:Xi=0 



n 



(7) 



Given the k non-zero symbols (3, $ is induced by a 
uniform distribution on the (^) possible supporting indices 
of the vector x. In many practical cases, f3 is drawn from 
some distribution. Our results can be extended to handle this, 
but presently we stick to fixed /3, and we assume all the 
components of [3 are distinct. The later assumption is just 
for saving some notation, and has no bearing on the technical 
details. The average error probability now becomes. 



Pr($ = 0). 



(8) 



The following lemma yields a lower bound on m, the number 
of measurements required for asymptotically exact support 
recovery. 

Lemma 3.1: For a given f3 with k non-zero elements, if 
Perror gocs to zcro with m, then 

k log(n/fc) 



TO > 



RcMAcik, a, 0-2) 



where 

RcMAcik, d, cr^ 



mm max 

a' Rem'' 



\R\ 



ei-''-{ReR{k,a- 



(9) 



(10) 



and a* is any permutation of the channel coefficients d. 
The proof of this lemma proceeds in number of stages. In the 
next few paragraphs, we will explain the essential ideas behind 
it. The arguments that we present shed light on some of the 
underlying bottlenecks in the detection problem. 

To obtain a bound as above, we map the support recovery 
(SR) problem to a communication problem and then establish 
the connection between the number of measurements m and 
the required number of channel uses in the communication 
model, or alternatively to the maximal rate at which error-free 
transmissions are possible. 

In principle, the communication setup that we describe 
can simulate any strategy for the support recovery problem. 
We briefly describe how any SR problem comes under our 
communication setup, see Figure below. 



p{ai) f 



Encoder 

Side 

Info 




Recall the notations introduced in paragraph 3 of this sec- 
tion. Consider k encoders trying to communicate information 
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to a decoder. Each encoder corresponds to a non-zero value of 
the input vector x in the support recovery problem. Perform 
a random permutation of the set Iq and partition it into k 
subsets {Ui, . . . , Uk}, provide this to each encoder as side- 
information. The decoder is given the index set Qi of each 
encoder's inputs, as well as the channel coefficient from 
that encoder Clearly this system can emulate the SR problem. 
We now take an alternate view to the keep the discussion as 
simple as possible. We describe a setup where the sparse vector 
X for the SR problem, and the messages for the correponding 
communication problem are generated together There is no 
loss of generality in coupling the two systems like this. 

To this end, randomly permute the indices of x and partition 
them into k sets Si, S2, ■ ■ ■ , Sk- To partially emulate the SR 
problem, the support of x is chosen by selecting one element 
from each of these sets, which correspond to the indices of 
the support of x. This selection will correspond to message 
selection in a fc-user communication channel, in which Sk is 
the message set of user k. A simple method of communication 
is for user k to encode the chosen message by sending the 
corresponding column of A directly (rather like a CDMA 
scheme, with no additional coding) and the decoder then 
receives 



y 



/3ir(i)«i + Z 



where Ui is the column corresponding to the message chosen 
by user i, and tt is a random permutation of {1, 2, . . . , fc} 
that assigns a component of x to user k. The decoder is given 
the vector (/37r(i))*'^i as side information. This coherent fc-user 
faded AWGN communication channel is a partial emulation of 
the CS decoding problem in ([T]i, except that here the decoder 
has more information: it knows that each Si contains exactly 
one index from the support of the vector x, and it knows the 
corresponding value of x in that component, namely /37r(i)- 
Note that user i is conveying log(IS'il) bits to the decoder, 
and the total number of bits being conveyed is X]i=i log(l'5'i|) 
bits. The decoder in this communication set-up must do at 
least as well as the CS decoder in the original problem, so 
these bits are being conveyed reliably. 

The above simple CDMA communication scheme is valid 
for the /c-user, faded AWGN channel in which, in general, 
the user is allowed to encode his messages using symbols 
each taken from the same alphabet as the symbols in A, 
and each codeword satisfies the power constraint (|6]l. Since 
the permutation tt is selected randomly, this is a compound 
MAC, and the rate region can in principle be calculated. In a 
compound MAC, the transmitter knows only a set of possible 
MACs from which one realization will be picked [5]. We do 
not go into the details of the coding theorems, rather we merely 
use the results on the achievable maximal sum-rate. Compound 
MAC capacity region is contained in the intersection of MAC 
capacity regions of the individual components; in our case, 
the sum-rate is at best that in (fTOl l. The lemma is proved 
by noting that the communication scheme requires successful 
communication of k\og{n/k)/m bits per channel-use, when 
we choose each set Si to have n/k indices. This rate must be 
upper bounded by the sum-rate of the compound MAC. 



Corollary 3.2: By using a Gaussian measurement ensem- 
ble, 



m > max ■ 



21ogf 



2k log 



log(l + a^/a2) log(l 



and when ak/a << 1.0, 



m > 



(7^ log ■ 



(11) 



(12) 



The corollary follows from Lemma 13.11 by noting that the 
maximal sum-rate in the compound MAC setting is less 
than A:log(l + since this is the sum of the single user 
constraints. The expression in ( fT2] l is identical to that obtained 
in [13], which can be further tightened by an alternative 
approach. Consider the above compound MAC, when we take 
Si to have size n — fc + 1 and \Si\ — > 1. In this case, 
user 1 is conveying log(?i — fc + 1) bits to the decoder, and the 
other users are conveying zero bits, since the decoder knows 
apriori that these users have only one index (corresponding to 
telling the CS decoder fc — 1 elements of the support set as 
side information). The single user rate constraint then tells us 
that 

log(n — fc + 1) 



m > 



log(l + a>2)- 



(13) 



Corollary 3.3: If the measurement matrix is chosen by 
Bernoulli(i) on { + 1,-1}, 



m > 



2fclog2f 



(14) 



log2 nek/2 

With {+1,-1} as the input alphabet, we can see that this 
channel has sum-rate strictly less than that available in a fc 
user binary-input adder channel [3]. The achievable sum-rate 
there is half that of the denominator in (fT4l i. This bound can 
be made tighter by considering an adder channel with noise, 
but we do not pursue it here. 

Bounding the number of measurement as above also allows 
us to get insights about the speed at which exponential decay 
of recovery-error happens, this is given in the following 
lemma. 

Lemma 3.4: The error probability in support recovery 
obeys. 



> exp(-i?o(Q!fc, (T^)m), 



(15) 



where Eo{a, cr^) is the cut-off rate of a standard scalar AWGN 
channel with power constraint o? ja^ . 

Notice that in the compound MAC we consider, the error 
probability in the scalar channel with gain ak lowerbounds 
the total error probability. The best exponent of error-decay 
for this channel is given by the above -Eo(')' which is also 
the maximal error exponent, happening at zero rate. We can 
extend this result to include the sphere-packing and straightline 
bounds, this is part of some ongoing work. 

IV. Related Work 

A direct comparison can be made between our work and 
that of [1]. In that paper, it was shown that Gaussian mea- 
surement matrices are asymptotically optimal for joint typical 
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decoders with 0(fc) measurements, with fixed SNR, for each 
error metric defined here. We extend this resuh to show that 
these sufficient conditions also hold for centered subgaussian 
measurement matrices in the linear sparsity regime. Necessary 
conditions are also established in [1] using arguments based 
on MACs, however, their bounds are not as refined as ours. 

In [13], necessary and sufficient conditions are given for 
error metric 1 . Sufficient conditions were established using an 
ML decoder while the necessary conditions exploited a corol- 
lary of Fano's inequality. By comparing results in [13] and 
[12], it was shown that, in the sublinear sparsity regime. Lasso 
is essentially information theoretically optimal. However, in 
the linear regime, there has been no practical algorithm that 
has achieved the n{k\og{n — k)) bound established in our 
paper and Fletcher et al. [6, Theorem 1]. 

Results from Fletcher et al. [6] is the closest to ours in terms 
of the scaling bounds they achieved. After submitting a first 
version here, we noticed that [6] describes some good bounds 
for the Gaussian case, along with a detailed comparison with 
existing bounds. Our converse bound generalizes their result, 
and we believe it is comparable for specific instances. A 
detailed study along this direction will be included in the final 
manuscript. 

Partial support recovery was also addressed in [10] and 
necessary conditions are given. There a general bound was 
derived for deterministic and stochastic signals. A bound 
strictly focussed on Fourier measurement matrices is found 
in [7], which uses Fano's inequality to establish the bound. 
In terms of the necessary condition in [10, Theorem 3.2], 
Theorem 13.11 is tighter and is also general as it applies to a 
variety of measurement ensembles. Theorem 13.11 is general 
enough to apply to structured codewords, such as Fourier 
measurement matrices, although the codewords now have a 
dependence. However, one needs to compute the capacity 
region of the a compound MAC channel using these structure 
codewords. 

V. Conclusion 

We have analyzed schemes for sparse signal recovery using 
subgaussian measurement matrices. Our achievability scheme 
used an impractical decoder Future work intends to tackle the 
performance of subgaussian matrices and practical decoders. 



where 

It can be shown that the supremum is achieved for t — ^ [1 — 

^(2e72 + 7i)"^^^] and that 



and that g{e) = A. To prove (|5]l, we note that logE[e*^] < 
7it^ for — 1/72 < i < 0. The result then follows. 

References 

[1] M. Akcakaya and V. Tarokh. Shannon theoretic hmits on noisy 
compressive sensing. IEEE Tran. Info. Theory, 2007. preprint, 
http://arxiv.org/PS.cache/arxiv/pdf/07 11/07 11 .0366vl .pdf. 

[2] L. Birge and P. Massart. Minimum contrast estimators on sieves: 
exponential bounds and rates of convergence. Bernoulli, 4(3):329-375, 
1998. 

[3] S.-C. Chang and J. K. Wolf. On the T-user M-frequency noiseless 

multiple-access channel with and without intensity information. IEEE 

Trans. Info. Theory, 27(1):41^8, January 1981. 
[4] T. M. Cover and J. A. Thomas. Elements of Information Theory. John 

Wiley and Sons, Inc., 2nd edition, 2006. 
[5] 1. Csiszar and J. Komer Information Theory. Akademiai Kiado, 3rd 

edition, 1981. 

[6] A. K. Fletcher, S. Rangan, and V. K. Goyal. Necessary and suf- 
ficient conditions on sparsity pattern recovery. 2008. preprint, 
http://arxiv.org/abs/0804. 1839. 
[7] M. Gastpar and Y. Bresler On the necessary density for spectrum-blind 
nonuniform sampling subject to quantization. In Proc. IEEE Int. Conf. 
Acoustics. Speech and Signal Processing, pages 1633-1636, Istanbul, 
Turkey, June 2000. 
[8] N. L. Johnson, S. Kotz, and N. Balakrishnan. Continuous Univariate 

Distributions, volume 1. John Wiley and Sons, 2nd edition, 1995. 
[9] T. Mikosch. Estimates for tail probabilities of quadratic and bilinear 
forms in subgaussian random variables. Probability and Mathematical 
Statistics, 1 1(2):169-178, 1991. 

[10] G. Reeves. Sparse signal sampling using noisy linear projections. 
Technical Report UCB/EECS-2008-3, Univ. of California, Berkeley, 
Dept. of Elec. Eng. and Comp. Sci., January 2008. 

[11] M. Rudelson and R. Vershynin. The smallest singular value 
of a random rectangular matrix. 2008. preprint, http://www- 
personal.umich.edu/'^romanv/papers/rv-rectangular-matiices.pdf. 

[12] M. Wainwright. Sharp thresholds for noisy and high- 

dimensional recovery of sparsity using £1 -constrained quadratic 
programming. IEEE Tran. Info. Theory, 2006. preprint, 
http://www.eecs.berkeley.edu/~wainwrig/Papers/Wai_SharpThres.pdf. 

[13] M. Wainwright. Information-theoretic bounds on sparsity recovery in 
the high-dimensional and noisy setting. In ISIT '07, pages 961-965, 
June 2007. 



Appendix 

We sketch the proof of the concentration result based on 
modification of arguments by Birge and Massart in [2]. Let e = 
72A-f V271A. We first prove Q bounding V using Chernoff's 
bound, 

Pr(y > e) < exp ^inf (-te + logE[e*^]) 
Since V satisfies the moment condition, 

logE[e'^] < -7it - ^ log(l - 72t) < 2{l-\t) 
we have 

Pr(F > e) < exp(-g(e)) 



